IN THE CLAIMS 

1 (Previously Presented). A method comprising: 

receiving a first program unit in a parallel computing environment having a team 
of parallel threads including at least a first and second thread, the first program unit including a 
memory copy operation to be performed between the first thread and the second thread; 

translating the first program unit into a second program unit, the second program 
unit to associate the memory copy operation with a set of one or more instructions, the set of 
instructions to ensure that the second thread copies data based, in part, on a first descriptor 
associated with the first thread; and 

copying an address of the first descriptor to a two address buffer. 

2 (Previously Presented). The method of claim 1 further comprising: 

copying data into a memory area associated with the second thread based, in part, 
on address and data information associated with the first descriptor. 

3 (Original). The method of claim 2 further comprising copying data into a memory 
area associated with second thread utilizing, in part, a second descriptor associated with the 
second thread. 

4 (Original). The method of claim 1 further comprising enabling the first thread to copy 
an address of the first descriptor to a buffer and setting a signal to enable the second thread to 
copy data associated with the first descriptor to a memory area associated with the second thread. 

5 (Original). The method of claim 4 further comprising enabling the first thread to enter 
a wait state after the signal is set. 

6 (Original). The method of claim 5 further comprising releasing the first thread from a 
wait state upon completion of the data copy operation by the second thread. 
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7 (Original). The method of claim 5 further comprising enabling the first thread to copy 
an address of the first descriptor to one of two buffer areas. 

8 (Original). The method of claim 1 further comprising receiving the first program unit 
in source code format and translating the first program unit into a second program unit in source 
code format. 

9 (Previously Presented). A machine-readable medium that provides instructions, that 
when executed by a machine, enables the machine to perform operations comprising: 

receiving a first program unit in a parallel computing environment, the first 
program unit including a memory copy operation to be performed between a first thread and a 
second thread; 

translating the first program unit into a second program unit, the second program 
unit to associate the memory copy operation with a set of one or more instructions, the set of 
instructions to ensure that the second thread copies data based, in part, on a first descriptor 
associated with the first thread; and 

copying an address of the first descriptor to a two address buffer. 

1 0 (Previously Presented). The machine-readable medium of claim 9, further 
comprising: 

copying data into a memory area associated with the second thread based, in part, 
on address and data information associated with the first descriptor. 

1 1 (Original). The machine-readable medium of claim 10, further comprising copying 
data into a memory area associated with second thread based utilizing, in part, a second 
descriptor associated with the second thread. 

12 (Original). The machine-readable medium of claim 9, further comprising enabling the 
first thread to copy an address of the first descriptor to a buffer and setting a signal to enable the 
second thread to copy data associated with the first descriptor to a memory area associated with 
the second thread. 
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13 (Original). The machine-readable medium of claim 12, further comprising enabling 
the first thread to enter a wait state after the signal is set. 

14 (Original). The machine-readable medium of claim 13, further comprising releasing 
the first thread from a wait state upon completion of the data copy operation by the second 
thread. 

15 (Original). The machine-readable medium of claim 13, further comprising enabling 
the first thread to copy an address of the first descriptor to one of two buffer areas. 

16 (Original). The machine-readable medium of claim 12, further comprising copying 
data into a memory area associated with second thread utilizing, in part, a second descriptor 
associated with the second thread. 

17 (Original). The machine-readable medium of claim 9 further comprising receiving the 
first program unit in source code format and translating the first program unit into the second 
program unit in source code format. 

1 8 (Previously Presented). A method comprising: 

receiving a first program unit in a parallel computing environment and translating 
the first program unit, in part, into one or more computer instructions, the instructions enabling a 
second thread in a team of threads to copy data, into a memory area associated with the second 
thread, from a private memory area associated with a first thread; and 

copying an address of a descriptor into a two address buffer utilized by the second 
thread, in part, to copy data from the memory area associated with the first thread. 

19 (Original). The method of claim 18, further comprising creating a descriptor utilized, 
in part, by the second thread to copy data into the memory area associated with the second 
thread. 
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20 (Original). The method of claim 19, further comprising setting a signal by the first 
thread enabling the second thread to copy the data from the memory area associated with the first 
thread. 

21 (Original). The method of claim 20, further comprising entering a wait state by the 
first thread until the second thread copies the data from the memory area associated with the first 
thread. 

22 (Previously Presented). An apparatus comprising: 

a memory including a shared memory location; 

a translation unit coupled with the memory, the translation unit operative to 
associate a first program unit, including a memory copy operation to be performed between a 
first thread and a second thread, with a set of one or more instructions, the set of instructions to 
ensure that the second thread copies data based, in part, on a first descriptor associated with the 
first thread; and 

wherein an address of the first descriptor is copied to a two address buffer by the 
first thread and the second thread copies data into a memory area associated with the second 
thread based, in part, on address and data information associated with the first descriptor. 

Claim 23 (Canceled). 

24 (Previously Presented). The apparatus as in claim 22 wherein the second thread 
copies data into a memory area associated with the second thread utilizing, in part, a second 
descriptor associated with the second thread. 

25 (Original). The apparatus as in claim 22 wherein the first thread copies an address of 
the first descriptor to a buffer and sets a signal to enable the second thread to copy data 
associated with the first descriptor to a memory area associated with the second thread. 

26 (Original). The apparatus as in claim 25 wherein the first thread enters a wait state 
after the signal is set. 



5 



27 (Original). The apparatus of claim 26, wherein the first thread exits the wait state after 
completion of the data copy by the second thread. 

28 (Original). The apparatus of claim 22 wherein the first program unit is in source code 

format. 

29 (Original). The apparatus of claim 28 wherein the first descriptor is passed to the first 
program unit. 

30 (Original). The apparatus as in claim 22 wherein the translation unit translates the 
first program unit, in part, into a second program unit in source code format and the second 
program unit includes the memory copy operation. 
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